image 1 image 2 image 3

The 6th IEEE International Conference on Data Science and Advanced Analytics

5–8 October 2019
Washington DC — USA

image 1 image 2 image 3

The 6th IEEE International Conference on
Data Science and Advanced Analytics

5–8 October 2019
Washington DC — USA

Data and information quality: Toward Better Data Science

Keynote Speaker: Divesh Srivastava

Head of Database Research, AT&T Labs

Speaker Bio:
Divesh Srivastava is the Head of Database Research at AT&T Labs-Research. He is a Fellow of the Association for Computing Machinery (ACM), the Vice President of the VLDB Endowment, on the ACM Publications Board and an associate editor of the ACM Transactions on Data Science (TDS). He has served as the managing editor of the Proceedings of the VLDB Endowment (PVLDB), as associate editor of the ACM Transactions on Database Systems (TODS), and as associate Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (TKDE). He has presented keynote talks at several international conferences, and his research interests and publications span a variety of topics in data management. He received his Ph.D. from the University of Wisconsin, Madison, USA, and his Bachelor of Technology from the Indian Institute of Technology, Bombay, India.

Title: Towards High-Quality Big Data for Responsible Data Science
Abstract:
Data are being generated, collected and analyzed today at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. As the use of big data has grown, so too have concerns that poor quality data, prevalent in large data sets, can have serious adverse consequences on data-driven decision making. Responsible data science thus requires a recognition of the importance of veracity, the fourth “V” of big data. In this talk, we present a vision of high-quality big data and highlight the substantial challenges that the first three “V”s, volume, velocity and variety, bring to dealing with veracity in big data. Due to the volume and velocity of data, one needs to understand and possibly repair poor quality data in a scalable and timely manner. With the variety of data, often from a diversity of sources, data quality rules cannot be specified a priori; one needs to let the “data to speak for itself.” We conclude with some recent results relevant to big data quality that are cause for optimism.

Special Session Aims and Scope:

Data quality is one of the main pillars of data science ensuring the validity of the analytics, outcomes and inferences that drive important decision making for a wide range of applications—self-driving cars, e-commerce, high frequency trading, social media feed, network traffic routing and many others. In this session we focus on the importance of data quality on a broad spectrum of data science endeavors. Many data quality issues, e.g., missing values (gaps), incomplete data (abnormally low counts), duplicates (abnormally high counts) have the potential to introduce unintended bias and variability in the data that could potentially have life-changing impact e.g., in the justice system or in healthcare applications.

Managing data quality issues after identifying them by interpreting, prioritizing and identifying actionable ones is a non-trivial and important research topic. Over-treating might lead to statistical distortion of the original data changing the nature of the data itself, while not treating them could lead to bad data driven decisions down the road. This special session aims to bring together researchers and practitioners of data science that are interested in the theory, methodology, applications, case studies and practical solutions related to data quality.

MONDAY, October 7th, 09:00 – 12:30

09:00-09:15
Welcome and introduction, Tamraparni Dasu
09:15:10:30
“Keynote: Towards High-Quality Big Data for Responsible Data Science”
10:30-11:00
BREAK
11:00-11:30
Range Analysis and Applications to Root Causing
Zurab Khasidashvili and Adam J Norman
11:30-12:00
Sensor-Based Human Activity Mining Using Dirichlet Process Mixtures of Directional Statistics Models
Lei Fang, Juan Ye and Simon Dobson
12:00-12:30
Truth Discovery from Multi-Sourced Text Data Based on Ant Colony Optimization
Chen Chang, Jianjun Cao, Guojun Lv and Nianfeng Weng

Organizers:

Tamraparni Dasu, AT&T Labs-Research;

Yaron Kanza, AT&T Labs-Research

© Copyright | DSAA 2019